A Scaffold Analysis Tool Using Mate-Pair Information in Genome Sequencing

نویسندگان

  • Pan-Gyu Kim
  • Hwan-Gue Cho
  • Kiejung Park
چکیده

We have developed a Windows-based program, ConPath, as a scaffold analyzer. ConPath constructs scaffolds by ordering and orienting separate sequence contigs by exploiting the mate-pair information between contig-pairs. Our algorithm builds directed graphs from link information and traverses them to find the longest acyclic graphs. Using end read pairs of fixed-sized mate-pair libraries, ConPath determines relative orientations of all contigs, estimates the gap size of each adjacent contig pair, and reports wrong assembly information by validating orientations and gap sizes. We have utilized ConPath in more than 10 microbial genome projects, including Mannheimia succiniciproducens and Vibro vulnificus, where we verified contig assembly and identified several erroneous contigs using the four types of error defined in ConPath. Also, ConPath supports some convenient features and viewers that permit investigation of each contig in detail; these include contig viewer, scaffold viewer, edge information list, mate-pair list, and the printing of complex scaffold structures.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

NxRepair: error correction in de novo sequence assembly using Nextera mate pairs

Scaffolding errors and incorrect repeat disambiguation during de novo assembly can result in large scale misassemblies in draft genomes. Nextera mate pair sequencing data provide additional information to resolve assembly ambiguities during scaffolding. Here, we introduce NxRepair, an open source toolkit for error correction in de novo assemblies that uses Nextera mate pair libraries to identif...

متن کامل

Clinical Application of Liquid Biopsy and Mate Pair Next Generation Sequencing for Oropharyngeal Cancer Patients

Oropharyngeal Cancer Patients Sarah Clark Biochemistry, Biology, Chemistry Background: Circulating tumor DNA (ctDNA) can be distinguished from other cell-free DNA in the body due to its unique mutations. Next generation sequencing and digital PCR have allowed detection of ctDNA to become a more commonly used tool, and because it can be detected directly out of blood, is called the liquid biopsy...

متن کامل

Scaffolding and validation of bacterial genome assemblies using optical restriction maps

MOTIVATION New, high-throughput sequencing technologies have made it feasible to cheaply generate vast amounts of sequence information from a genome of interest. The computational reconstruction of the complete sequence of a genome is complicated by specific features of these new sequencing technologies, such as the short length of the sequencing reads and absence of mate-pair information. In t...

متن کامل

Theoretical Bounds on Mate-Pair Information for Accurate Genome Assembly

Over the past two decades, a series of works have aimed at studying the problem of genome assembly: the process of reconstructing a genome from sequence reads. An early formulation of the genome assembly problem showed that genome reconstruction is NP-hard when framed as finding the shortest sequence that contains all observed reads. Although this original formulation is very simplistic and doe...

متن کامل

Optimization and cost-saving in tagmentation-based mate-pair library preparation and sequencing.

In de novo genome sequencing, mate-pair reads are crucial for scaffolding assembled contigs. However, preparation of mate-pair libraries is not a trivial task, even when using one of the latest approaches, the Nextera Mate Pair Sample Prep Kit from Illumina. To reduce cost and enhance library yield and fidelity when using this kit, we have modified the manufacturer's protocol based on (i) varia...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of Biomedicine and Biotechnology

دوره 2008  شماره 

صفحات  -

تاریخ انتشار 2008